Searching with quantization: approximate nearest neighbor search using short codes and distance estimators
نویسندگان
چکیده
We propose an approximate nearest neighbor search method based on quantization. It uses, in particular, product quantizer to produce short codes and corresponding distance estimators approximating the Euclidean distance between the orginal vectors. The method is advantageously used in an asymmetric manner, by computing the distance between a vector and code, unlike competing techniques such as spectral hashing that only compare codes. Our approach approximates the Euclidean distance based on memory efficient codes and, thus, permits efficient nearest neighbor search. Experiments performed on SIFT and GIST image descriptors show excellent search accuracy. The method is shown to outperform two state-of-the-art approaches of the literature. Timings measured when searching a vector set of 2 billion vectors are shown to be excellent given the high accuracy of the method. Key-words: nearest neighbor search, large databases, quantization This technical report is in submission. ∗ [email protected] in ria -0 04 10 76 7, v er si on 1 24 A ug 2 00 9 Quantifier pour chercher: recherche approximative par codes compacts et estimateurs de distances Résumé : Nous proposons une méthode de recherche appproximative qui permet d’estimer la distance entre deux vecteurs en utilisant des codes courts quantifiés. Ces codes sont définis de manière conjointe avec leur estimateurs, qui approximent la distance euclidienne entre deux vecteurs. La méthode permet d’estimer la distance entre deux vecteurs à partir de leur codes respectifs. Contrairement aux techniques concurrentes, elle peut également être utilisée de manière asymétrique avec un estimateur de distance qui prend en entrée un vecteur et un code, ce qui améliore la qualité de l’estimation. Nous montrons que notre approche offre des résultats qui sont significativement au dessus de ceux de l’état de l’art en terme du compromis entre usage mémoire et qualité de la recherche. Les temps de recherche mesurés sur une base de vecteurs de 2 milliards de vecteurs SIFT montrent l’intérêt de notre méthode en pratique. Mots-clés : recherche de plus proches voisins, grandes bases de données, distance euclidienne, quantification in ria -0 04 10 76 7, v er si on 1 24 A ug 2 00 9 Searching with quantization 3
منابع مشابه
Composite Quantization for Approximate Nearest Neighbor Search
This paper presents a novel compact coding approach, composite quantization, for approximate nearest neighbor search. The idea is to use the composition of several elements selected from the dictionaries to accurately approximate a vector and to represent the vector by a short code composed of the indices of the selected elements. To efficiently compute the approximate distance of a query to a ...
متن کاملApproximate Nearest Neighbor Search by Residual Vector Quantization
A recently proposed product quantization method is efficient for large scale approximate nearest neighbor search, however, its performance on unstructured vectors is limited. This paper introduces residual vector quantization based approaches that are appropriate for unstructured vectors. Database vectors are quantized by residual vector quantizer. The reproductions are represented by short cod...
متن کاملPolysemous Codes
This paper considers the problem of approximate nearest neighbor search in the compressed domain. We introduce polysemous codes, which offer both the distance estimation quality of product quantization and the efficient comparison of binary codes with Hamming distance. Their design is inspired by algorithms introduced in the 90’s to construct channel-optimized vector quantizers. At search time,...
متن کاملNearest Neighbor Search using Kd-trees
We suggest a simple modification to the kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than three. Since the exact nearest neighbor search problem suffers from the curse of dimensi...
متن کاملLearning Better Encoding for Approximate Nearest Neighbor Search with Dictionary Annealing
We introduce a novel dictionary optimization method for high-dimensional vector quantization employed in approximate nearest neighbor (ANN) search. Vector quantization methods first seek a series of dictionaries, then approximate each vector by a sum of elements selected from these dictionaries. An optimal series of dictionaries should be mutually independent, and each dictionary should generat...
متن کامل